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Abstract. The objective of this article is to present the development and evaluation of dETECT 
(Evaluating TEaching CompuTing), a model for the evaluation of the quality of instructional units 
for teaching computing in middle school based on the students’ perception collected through a 
measurement instrument. The dETECT model was systematically developed and evaluated based 
on data collected from 16 case studies in 13 different middle school institutions with responses 
from 477 students. Our results indicate that the dETECT model is acceptable in terms of reli¬ 
ability (Cronbach’s alpha a=.787) and construct validity, demonstrating an acceptable degree of 
correlation found between almost all items of the dETECT measurement instrument. These results 
allow researchers and instructors to rely on the dETECT model in order to evaluate instructional 
units and, thus, contribute to their improvement and to direct an effective and efficient adoption of 
teaching computing in middle school. 

Keywords: computing, evaluation, instructional unit, middle school. 


1. Introduction 


Teaching computing through summer camps, clubs or in family workshops is a world¬ 
wide trend (Gresse von Wangenheim and Wangenheim, 2014). There are several ini¬ 
tiatives to teach computing such as Code.org (http://www.code.org), Code.club 
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(https://www.codeclubworld.org), Computing at Schools (http://www.com- 
putacaonaescola.ufsc.br/), among others. These initiatives are expected to con¬ 
tribute to the popularization of computing competencies as well as the awareness and 
interest of the students towards computing (Guzdial et al, 2014; Garneli et al., 2015). 

Taking into consideration the growing number of alternative instructional units (IUs) 
for teaching computing, it is important to obtain evidence on the expected benefits as 
a basis for their systematic selection, adoption and improvement (Decker et al, 2016). 
Following Guzdial (2004), a main contribution to this knowledge area is not necessarily 
the development of new programming environments or instructional units, but to find 
out how to study the existing ones. A more precise understanding of the results of using 
these instructional units would make it possible to know whether they contribute, in fact, 
positively to the achievement of the learning goals and compensate the cost involved 
in their adoption. However, although there is evidence that existing IUs can improve 
the teaching and learning process in middle school being used more widely in schools 
worldwide, there is little research on the analysis of the contribution that these IUs can 
bring to education (Decker et al, 2016). 

Currently, the evaluation of the quality of IUs is limited or even, sometimes, non-ex¬ 
istent (Decker et al, 2016; Garneli et al., 2015). In many cases, a decision about the use 
of IUs is based on assumptions of their effectiveness (Gross and Powers, 2005; Wilson 
et al, 2010). On the other hand, some studies focus on specific quality factors only, such 
as learning improvement (Gross and Powers, 2005; Kalelioglu and Giilbahar, 2014). 
Other studies focus on the effectiveness of visual block-based programming languages 
(Weintrop and Wilensky, 2015; Grover etal., 2014; Perdikuri, 2014). However, students’ 
perceptions and intentions are also determining factors for successful learning (Gian- 
nakos et al, 2013). Yet, few evaluations take into consideration aspects such as motiva¬ 
tion and the students’ experience during the instructional unit (Craig and Horton, 2009; 
Giannakos et al, 2014), or students’ attitudes toward technology acceptance (Giannakos 
et al, 2013). In addition, studies that measure students’ attitude toward computing are 
rather designed for higher education and seem to be outdated in the current context of 
teaching computing in schools (Garland and Noyes, 2008). 

The measurements used to evaluate the quality of IUs to teach computing vary wide¬ 
ly, ranging from generic scales of students’ attitudes toward computing to measurement 
instruments developed in an ad-hoc way. Many measurements are developed without 
the definition of a model to derive the items of the measurement instrument based on 
theoretical constructs, which may make the validity of the results questionable. Thus, 
currently, there is a lack of systematically developed and evaluated evaluation models 
and/or measurement instruments that are widely accepted to evaluate the quality of IUs 
for teaching computing in schools. However, such evaluation models have to take into 
consideration the characteristics of such IUs typically performed more informally, for 
example, as programming workshops for parents and children outside the school envi¬ 
ronment. In such a context, it may be impracticable to carry out experiments that require 
pre-tests and inclusion of control groups, causing a major interruption and influencing 
the fun factor of the workshop. A more viable alternative may be the conduction of case 
studies, in which the evaluation of the IU is performed only at the end of the workshop/ 
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course (post-test), typically through a questionnaire to obtain the students’ perceptions 
(Wohlin et al., 2012). An advantage of this study type is that evaluation can be per¬ 
formed with little effort and in a non-intrusive way at the end of the instructional unit. 
Studies based on the measurement of perceptions, using questionnaires, are conducted in 
a variety of different research areas providing reliable, valid and useful information (De- 
vellis, 2016; Takatalo et al., 2010, Sweetser and Wyeth, 2005; Poels et al., 2007). Thus, 
the objective of this article is to present the development and evaluation of dETECT 
(Evaluating TEaching CompuTing), a model for the evaluation of the quality of instruc¬ 
tional units for teaching of computing in schools based on the students’ perception. 


2. Research Method 

In order to develop a model for the evaluation of instructional units for teaching comput¬ 
ing, an applied research was carried out (Miller and Salkind, 2002), divided into four 
stages (Fig. 1): 

• Stage 1. Literature review 

• Stage 2. Developing of the dETECT Evaluation Model 

• Stage 3. Design of the measurement instrument 

• Stage 4. Application and evaluation of the measurement instrument 

Stage 1. Literature review. In a first exploratory stage, we conducted a literature review 
on bibliography related to evaluation models of instructional units for teaching comput¬ 
ing in schools. 


Stage 1 Identification of 
evaluation models of 

instructional units for Literature review 

teaching computing in 

schools 

Stage2 Definition of an Goa | /Quest ion/Metric approach 
evaluation model 


Stage 3 Design of 

measurement 

instruments 

Stage4Application and 
evaluation of the 
measurement 
instrument 


Questionnaire design 
Face validity: Expert panel 
Case study 


Physical computing 
workshops 


Interdisciplinary 
game programming 


App development 
workshops 


Statistical analysis: Cronbach's Alpha; Item-total correlation; Factorial analysis 


Fig. 1. Research method. 
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Stage 2. Developing of the dETECT Evaluation Model. Based on the results of the lit¬ 
erature review, we systematically developed the dETECT evaluation model for measur¬ 
ing the quality of instructional units for teaching computing based on the perceptions of 
the students and their parents. Therefore, we used GQM - Goal Question Metric (Basili 
et al, 1994), a popular approach to measure diverse quality attributes. Using GQM we 
systematically defined the evaluation objective(s) and decomposed the objective into 
analysis questions and measures. 

Stage 3. Design of the measurement instrument. In order to operationalize the mea¬ 
surement, a questionnaire was developed by a multidisciplinary team, based on methods 
for scale and questionnaire development (Devellis, 2016; Krosnick and Presser; 2010; 
Malhotra, 2008; Kasunic, 2005). For each of the defined measure, questionnaire items 
have been defined also based on similar studies that were found in literature, considered 
adherent to the context of this study and to the defined measurement plan. The question¬ 
naire has been revised and piloted with a small sample of the target audience. 

Stage 4. Application and evaluation of the measurement instrument. A case study 
(Yin, 2009; Wohlin et al, 2012) was conducted in order to evaluate the measurement 
instrument in terms of reliability and construct validity. For the definition of the evalu¬ 
ation, we used the GQM approach (Basili et al, 1994). The objective of the study was 
decomposed into quality factors and analysis questions also in accordance with meth¬ 
ods for scale development (Carmines and Zeller, 1979; Devellis, 2016; Trochim and 
Donnelly, 2008). During the case study, the dETECT model measuring instrument was 
applied as part of the evaluation of 16 courses/computing workshops carried out in dif¬ 
ferent educational institutions collecting the required data. The pooled data was ana¬ 
lyzed in order to answer our analysis questions, following the definition of Trochim and 
Donnelly (2008) and the scale development guide proposed by DeVellis (2016). In terms 
of reliability, internal consistency is typically measured based on the correlations be¬ 
tween different items on the same measurement instrument (Carmines and Zeller, 1979; 
Trochim and Donnelly, 2008). Internal consistency is usually measured through Cron- 
bach’s alpha, a popular method to assess the reliability of the measurement instrument 
(Carmines and Zeller, 1979). In terms of construct validity, convergent and discriminant 
validity are the two subtypes of validity that make up construct validity (Trochim and 
Donnelly, 2008). Convergent validity refers to the degree to which two items of quality 
factors that theoretically should be related, are in fact related. In contrast, discriminant 
validity tests whether concepts or measurements that are supposed to be unrelated are in 
fact unrelated (Trochim and Donnelly, 2008). In order to analyze the convergent and dis¬ 
criminant validity of the dETECT measurement instrument, the intercorrelations of the 
items and item-total correlation are calculated (DeVellis, 2016). Intercorrelation refers to 
the degree of correlation between the items of a measurement instrument (Carmines and 
Zeller, 1979; DeVellis, 2016). The higher the correlations among items that measure the 
same quality factor, the higher the validity of individual items and, hence, the validity 
of the instrument as a whole. Item-total correlation is analyzed in order to check if any 
item in the measurement instrument is inconsistent with the averaged correlation of the 
others, and thus, can be discarded (Carmines and Zeller, 1979; DeVellis, 2016). 
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In addition, we used factor analysis to determinate how many factors underlie the 
set of items of the dETECT measurement instrument, following the analysis process 
proposed by Brown (2006). Each factor is defined by those items that are more highly 
correlated with each other than with other items. A statistical indication of the extent to 
which each item is correlated with each factor is given by the factor loading. Thus, the 
higher the factor loading, the more the particular item contributes to the given factor. 
Thus, factor analysis also explicitly takes into consideration the fact that the items mea¬ 
sure a factor unequally (Carmines and Zeller, 1979). 

This research was approved by the Ethics Committee of the Federal University of 
Santa Catarina (No. 1021541). 


3. The Evaluation Model dETECT (Evaluating TEaching CompuTing) 

The objective of the dETECT model is to analyze instructional units in order to evaluate 
the quality in terms of quality of the IUs, computing experience and the perception of 
learning, from the learners’ perspective in the context of teaching computing in middle 
school. From this objective, the analysis questions and measures are derived based on 
literature (Fig. 2) (Keller, 1987; Sweetser and Wyeth, 2005; Poels et al., 2007; Takatalo 
et al., 2010; Ericson and McKlin, 2012; Tangney et al., 2010; Wiebe et al., 2003; Papas- 
tergiou, 2008; Sanchez-Franco, 2010; Giannakos et al., 2013; Makris et al., 2013; Shih, 
2008; Sivilotti and Laugel, 2008; Lai and Lai, 2012; Lee et al., 2009; Savi et al., 2012; 
Kwon et al., 2012). 

In a general way, following the definition proposed by Wiggins and McTighe (2005), 
an IU is a set of lessons carefully designed to collectively achieve a selected group of 
learning objectives for a target audience. The unit consists of a coherent set of materials 
designed to support student learning in a specific educational context and offers goals, 
assessment tasks, instruction, implementation procedures, and resources. However, due 
to the lack of a definition of an IU for teaching computing in schools, based on the litera¬ 
ture review (Keller, 1987; Sweetser and Wyeth, 2005; Poels et al., 2007; Takatalo et al., 
2010; Ericson and McKlin, 2012; Tangney etal., 2010; Wiebe et al., 2003; Papastergiou, 
2008; Sanchez-Franco, 2010; Giannakos et al., 2013; Makris et al., 2013; Shih, 2008; 


Perception 
of Quality 



Quality of the 
Instructional Unit 


Computing 

Experience 


Perception of 
Learning 


Fig. 2. Decomposition of the quality factors. Source: authors. 
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Sivilotti and Laugel, 2008; Lai and Lai, 2012; Lee et al., 2009; Savi et al., 2012; Kwon 
et al., 2012) we consider, that an instructional unit (workshop, course, etc.) with quality 
achieves its learning objectives, promotes pleasant activities, facilitates learning, and 
that creates a positive perception and interest for computing. 


Table 1 

Measurement Instrument 


No. Description 

Response Format 

Quality of the Instructional Unit 

1 The workshop/course was: 

(1) Lot of fun 

(2) Fun 

(3) Annoying 

(4) Very Annoying 

2 The time of the workshop/course passed: 

(1) Very quickly 

(2) Quickly 

(3) Slowly 

(4) Very slowly 

3 The workshop/course was: 

(1) Excellent 

(2) God 

(3) Regular 

(4) Bad 

Computing Experience 

4 I will show my computer program to others: 

(1) Yes 

(2) No 

5 I want to learn more about how to make computer programs: 

(1) Yes 

(2) No 

6 Making a computer program is: 

(1) Lot of fun 

(2) Fun 

(3) Annoying 

(4) Very Annoying 

7 I like to make computer programs: 

(1) Yes 

(2) No 

8 Computing is useful in everyday life: 

(1) Yes 

(2) No 

9 I want to learn more about how to make computer programs: 

(1) Yes 

(2) No 

Perception of Learning 

10 The workshop/course was: 

(1) Very easy 

(2) Easy 

(3) Difficult 

(4) Very Difficult 

11 I can write computer programs: 

(1) Yes 

(2) No 

12 I can explain to a friend how to make a computer program: 

(1) Yes 

(2) No 

13 Making a computer program is: 

(1) Very easy 

(2) Easy 

(3) Difficult 

(4) Very Difficult 
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The measurement is operationalized by the development of a questionnaire to be 
answered by the students at the end of the instructional unit, in order to obtain their per¬ 
ception about the quality of the instructional unit. The items that compose the question¬ 
naire (Table 1) are defined for each of the measures derived from similar studies found 
in literature considered adherent to the context of this research (Keller, 1987; Sweetser 
and Wyeth, 2005; Poels et al., 2007; Takatalo et al., 2010; Ericson and McKlin, 2012; 
Tangney et al., 2010; Wiebe et al., 2003; Papastergiou, 2008; Sanchez-Franco, 2010; 
Giannakos et al., 2013; Makris et al., 2013; Shih, 2008; Sivilotti and Laugel, 2008; Lai 
and Lai, 2012; Lee et al., 2009; Savi et al., 2012; Kwon et al., 2012). 

The complete material of the dETECT model is available at: http://www.com- 
putacaonaescola.ufsc.br/?page_id=45. 


4. Definition and Execution of the Evaluation of the dETECT Model 

When developing evaluation models and questionnaires, it is fundamental to analyze 
whether they are measuring what is intended (construct validity) and whether the same 
measurement process produces the same results (reliability) (Carmines and Zeller, 1979). 
Therefore, we evaluated the measurement instrument of the dETECT model in terms of 
reliability and construct validity from the viewpoint of researchers in the context of in¬ 
structional units for teaching computing in school. The following analysis questions are 
taken into consideration: 

Reliability 

AQ1: Is there evidence for internal consistency of the dETECT measurement instru¬ 
ment? 

Construct Validity 

AQ2: Is there evidence of the convergent and discriminant validity of the dETECT 
measurement instrument? 

AQ3: How do underlying factors influence the responses on the items of the dETECT 
measurement instrument? 

For the evaluation of the dETECT model, 16 case studies were performed applying 
three different instructional units in 13 different educational institutions between 2015 
and 2016, involving a total of 477 students (Table 2). The measurement took place at 
the end of instructional units teaching computing, either in form of short 4-hours work¬ 
shops or as part as interdisciplinary school units during 10-12 weeks (with 2 hours 
weekly). The units have been applied on the educational stage of middle school with 
children of age 10 to 14. 

The target audience is middle school students including different types of activities 
during the regular school schedule as well as extracurricular workshops in Brazil. The 
instructional units aim at teaching computing focusing on programming and compu¬ 
tational thinking (Table 3). More information on the instructional units is available at: 
http://www.computacaonaescola.ufsc.br. 
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Table 2 

Summary of case studies 


Instructional Unit Institution/Date 


j. c 

I n 

il'S 


Physical Computing INE/UFSC - Florianopolis - August 8, 2015 8 

Workshop Escola Hamonia - Ibirama/SC - August, 29, 2015 13 

INE/UFSC - Florianopolis - October 17,2015 14 

IFSC - Gaspar/SC - October 20 and 22, 2015 32 

Escola Sabedoria Junior - Florianopolis/SC - November 4, 2015 22 

INE/UFSC - Florianopolis - November 7,2015 16 

INE/UFSC - Florianopolis - November 14, 2015 15 

Games with Scratch Turmas 5Mat, 5 Vesp, 7A, 7B - Escola Autonomia, Florianopolis/SC - 2015 99 

(Interdisciplinary Course) Escola Basica Municipal Prefeito Reinaldo Weingartner, PalhoQa/SC - 2015 25 

EEB Prof Vitorio Anacleto Cardoso, Gaspar/SC - 2015 43 

EEB Zenaide Schmitt Costa, Gaspar/SC - 2015 31 

EEB Luiz Franzoi, Gaspar/SC - 2015 15 

EEB Ferandino Dagnoni, Gaspar/SC - 2015 46 

EEB Prof Dolores dos Santos Krauss, Gaspar/SC - 2015 14 

EEB Norma Monica Sabel, Gaspar/SC - 2015 49 

App Inventor Workshop Escola Basica Prof. a Herondina Medeiros Zeferino - 2016 35 


Total 477 


Table 3 

Overview of the instructional units applied 


Physical Computing Workshop Games with Scratch 


App Inventor Workshop 



Integrating Scratch/Snap! with Ar- In an interdisciplinary way students Student learn how to program a mo- 
duino and pieces of hardware in a learn basic computer concepts by bile app game using App Inventor, 
low-cost solution, students learn programming games involving 
to program an interactive robot, different contents (e.g. history, 

Portuguese language, geography, 
etc.) using Scratch. 
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4.1. Analysis 


In order to obtain greater precision and statistical power through a larger sample size, 
the data collected in the 16 case studies were pooled to answer the defined analysis 
questions. 

Reliability 

AQ1: Is there evidence for internal consistency of the dETECT measurement instru¬ 
ment? 

In order to answer this question, we evaluated the internal consistency of the dE¬ 
TECT measurement instrument through Cronbach’s alpha coefficient (DeVellis, 2016; 
Trochim and Donnelly, 2008). Cronbach’s alpha coefficient (Cronbach, 1951) indicates 
indirectly the degree to which a set of items measures a single quality factor. Thus, we 
want to know whether the dETECT measurement instrument measures the same qual¬ 
ity factor, the perception of the quality of the instructional unit. Typically, values of 
Cronbach’s alpha, ranging from 0.70 to 0.95 are considered acceptable (DeVellis, 2016), 
indicating an internal consistency of the instrument. 

Analyzing the 13 items of the measuring instrument (Table 1), the value of Cron¬ 
bach’s alpha is acceptable (a = .787). We, thus, can conclude that the answers to the 
items are consistent and precise, indicating the reliability of the measuring instrument 
items of the dETECT model. 

Construct Validity 

AQ2: Is there evidence of the convergent and discriminant validity of the dETECT 
measurement instrument? 

Construct validity of a measurement instrument refers to the ability to actually mea¬ 
sure what it purports to measure (Carmines and Zeller, 1979; Trochim and Donnelly, 
2008). Convergent and discriminant validity are the two subtypes of validity that make 
up construct validity (Trochim and Donnelly, 2008). Convergent validity shows that the 
items that should be related are in reality related. On the other hand, discriminant valid¬ 
ity shows that the items that should not be related are in reality not related (Carmines and 
Zeller, 1979; Trochim and Donnelly, 2008). In order to obtain evidence of the conver¬ 
gent and discriminant validity of the items of the dETECT measurement instrument, the 
intercorrelations of the items and correlation item-total are calculated (DeVellis, 2016). 

Intercorrelations of the items. In order to analyze the intercorrelations between the 
items, we used the nonparametric Spearman correlation matrices (Table 4). The matrices 
show the Spearman correlation coefficient, indicating the degree of correlation between 
two items (item pairs). We used this correlation coefficient, as it is the most appropriate 
correlation analysis for Likert scales (Trochim and Donnelly, 2008). The correlation 
coefficients between the items within of the same dimension are colored. In accordance 
to Cohen (1988), a correlation between items is considered satisfactory, if the correlation 
coefficient is greater than 0.29, indicating that there is a medium or high correlation be- 
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Table 4 

Spearman correlation coefficient 



tween the items. Satisfactory correlations are marked in bold. The numbers of the items 
are related to the specification presented in Table 1. 

Analyzing the interrelations between the items of the three quality factors (Table 4), 
we can observe that most of the item pairs have medium or high correlation regarding 
each quality factor. However, some item pairs have a low correlation (e.g., 1-2, 6-9, 
10-11). Even so, the results indicate evidence of convergent validity. 

On the other hand, some item pairs (e.g., 1-6, 3-6, 5-11) presented medium or high 
correlation with items of another quality factor. Thus, there is no evidence of discrimi¬ 
nant validity. However, the lack of discriminant validity is acceptable, as, although the 
model is divided into three quality factors, all factors are also related to a single factor, 
which is the perception of the quality of the IU. 

Item-total correlation. This method is complementary to the previous one in order to 
evaluate the correlation with all the other items. Each item of the instrument should have 
medium or high correlation with all the other items (DeVellis, 2016), as this indicates 
that the items present consistency in comparison to the other items. On the other hand, a 
low item-total correlation of an item undermines the validity of the scale, and, therefore, 
should be eliminated. Table 5 shows the correlation coefficients between a single item 
and the other items of the measurement instrument. 

We used the method of corrected item-total correlation, which compares one item 
with every other one of the instrument, excluding itself. Reference values for the analy¬ 
sis are the same as presented in the previous section based on Cohen (1988), considering 
a correlation satisfactorily, if the correlation coefficient is greater than 0.29. Items with 
low correlation are marked in bold. In addition. Table 5 also shows the Cronbach’s alpha 
if an item was deleted, expecting that no item elimination should cause a substantial 
decrease in the Cronbach’s alpha (DeVellis, 2016). 
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Table 5 

Corrected item-total correlation 


Quality factor 

No. Item 

Corrected item-total 
correlation 

Cronbach’s alpha, 
if item was deleted 

Quality of IU 

1 

.511 

.764 


2 

.269 

.794 


3 

.459 

.769 

Computing 

4 

.431 

.773 

Experience 

5 

.511 

.769 


6 

.594 

.753 


7 

.481 

.771 


8 

.338 

.783 


9 

.410 

.774 

Perception of 

10 

.280 

.787 

Learning 

11 

.470 

.770 


12 

.415 

.773 


13 

.474 

.769 


In general, item-total correlations are medium and high. Most items demonstrate ac¬ 
ceptable item-total correlation and satisfactory values of Cronbach’s alpha coefficient, if 
item was deleted, thus, indicating, the validity of the quality factors. Only the items 2 
(“The time of the workshop passed:”) and 10 (“The workshop was:”) presented a low item- 
total correlation. In addition, item 2 presents a small increase in Cronbach’s alpha if the 
item was deleted. Consequently, the results indicate that these items need to be reviewed. 

AQ3: How do underlying factors influence the responses on the items of the dETECT 
measurement instrument? 

In order to identify the number of factors (quality factors) that represents the respons¬ 
es of the set of the 13 items of the dETECT measurement instrument, we performed a 
factor analysis. 

To analyze whether the items of the dETECT measurement instrument can be submit¬ 
ted to a factor analysis (Brown, 2006), we used the Kaiser-Meyer-Olkin (KMO) index. 
This method indicates how much the realization of the factor analysis is appropriate for 
a specific set of items (Brown, 2006). The KMO index measures the sampling adequacy 
with values between 0.0 and 1.0. An index value near 1.0 supports a factor analysis and 
anything less than 0.5 is probably not amenable to useful factor analysis (Dziuban and 
Shirkey, 1974). Analyzing the set of items of the dETECT measurement instrument, 
we obtained a KMO index of .827. Consequently, it indicates that factor analysis is ap¬ 
propriate in order to analyze the number of factors that represents the responses of the 
dETECT measurement instrument. 

Running a factorial analysis, the number of factors retained in the analysis is decided 
(Glorfeld, 1995; Brown, 2006). Here we used the Kaiser-Guttman criterion for this deci¬ 
sion, as it is the most commonly used method of determining the number of factors. This 
method states that the number of factors is equal to the number of eigenvalues greater 
than 1 (Glorfeld, 1995). The eigenvalue refers to the value of the variance of the all the 
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items which is explained by a factor (Glorfeld, 1995). Following the Kaiser-Guttman 
criterion, our results show that three factors should be retained in the analysis. Regard¬ 
ing the dETECT model, this means that the responses of the measuring instmment are 
representing three underlying factors, indicating a decomposition similar to the original 
definition of the model. 

Once identified the number of underlying factors, another issue is to determine which 
items are loaded into which factor. In order to identify the factor loadings of the items, 
a rotation method is used (Brown, 2006; Tabachnick and Fidel, 2007). Flere we used the 
Varimax with Kaiser Normalization rotation method being the most widely accepted and 
used rotation method (Tabachnick and Fidel, 2007). Table 6 shows the factor loadings of 
the items associated with the three retained factors. The highest factor loading of each 
item, indicating to which factor the item is most related, is marked in bold. 

Analyzing the factor loadings of the items (Table 6), we can observe that, the first 
factor (factor 1), includes a set of 7 items (4, 5, 6, 7, 8, 9 and 12). Thus, this factor is 
directly related to the quality factor of the computing experience provided by the instruc¬ 
tional unit (Table 1). With the exception of item 12, all items correspond to the referred 
quality factor in the original structure of the dETECT model. Although, item 12 has the 
highest factor loading on factor 1, it also presents a similar factor loading (.410) with 
respect to factor 3, thus, showing that this item contributes to both quality factors (com¬ 
puting experience and perception of learning). Regarding factor 2, a set of three items 
(1,2 and 3) is considered. This result seems to suggest that these items are related to the 
factor related to the quality of the instructional unit of the dETECT model. In fact, these 
items correspond to the same quality factor (quality of the IU) in the original definition 
of the dETECT model (Table 1). Analyzing the results of factor 3, it includes a set of 
three items (10, 11 and 13), indicating that these items are related to a single quality fac¬ 
tor (perception of learning). 


Table 6 

Factor loadings 


Quality factor 

Item 

no. 

Description 

Factors 

1 2 

3 

Quality of IU 

i 

The workshop was: 

.146 

.763 

-.018 


2 

The time of the workshop passed: 

-.096 

.619 

.043 


3 

The workshop was: 

.110 

.800 

-.078 

Computing 

4 

I will show my computer program to others: 

.591 

.101 

.008 

Experience 

5 

I want to learn more about how to make computer programs: 

.571 

.217 

.035 


6 

Making a computer program is: 

.510 

.400 

.055 


7 

I like to make computer programs: 

.683 

.114 

-.053 


8 

Computing is useful in everyday life: 

.783 

-.213 

-.090 


9 

I want to learn more about how to make computer programs: 

.401 

.130 

.165 

Perception of 

10 

The workshop was: 

-.425 

.239 

.823 

Learning 

11 

I can write computer programs: 

.415 

-.177 

.546 


12 

I can explain to a friend how to make a computer program: 

.432 

-.139 

.410 


13 

Making a computer program is: 

.230 

-.102 

.720 
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5. Discussion 

The obtained results show sufficient evidence to consider the reliability and construct 
validity of dETECT as an acceptable model for the evaluation of instructional units for 
teaching computing in middle school. 

In terms of reliability (AQ1), the results of the analysis indicate an acceptable Cron- 
bach’s alpha for all quality factors (Cronbach’s alpha a=.787), indicating the internal 
consistency of the dETECT measurement instrument. Thus, it indicates that the items of 
dETECT measurement instrument are consistent and precise with respect to the evalua¬ 
tion of instructional units for teaching computing. 

In terms of construct validity, with regard to convergent validity (AQ2), we identified 
that most items have medium and high correlation, mainly between items of the same 
quality factor (e.g., quality of IU, computing experience, and perception of learning). In 
this way, we can conclude that there is evidence of convergent validity considering the 
quality factors. This indicates that the items of the measuring instrument seem to be ac¬ 
tually measuring what they intend to measure (e.g., quality of IU, computing experience, 
and perception of learning). However, some items have a low correlation, both within a 
single quality factor and in relation to the other factors (e.g., items 4-9). This may be due 
to the description of the items derived from the ones found in literature, and, thus, may 
indicate that these items need to be revised. 

With respect to discriminant validity, in general, most of the items present a low 
correlation with items of other quality factors. However, some item pairs (e.g., 1-6, 
5-11) have a medium or high correlation with items of another quality factor. Thus, 
the results do not indicate evidence of discriminant validity. However, in this case, the 
lack of discriminant validity is acceptable, because, although the model is divided into 
three quality factors, all factors are also related to a single factor, which is the percep¬ 
tion of the quality of the instructional unit, as proposed in the original composition of 
the dETECT model (Fig. 2). 

Analyzing the item-total correlation, again, the majority of the items presents a sat¬ 
isfactory correlation with the other items of the measuring instrument. Thus, indicating 
that the set of items of the measuring instrument of the dETECT model is related to 
measure what they propose to measure (perception of quality of an IU). 

Based on the results of the factor analysis (AQ3), we identified that the data collected 
in the case studies are explained by three factors. This confirms the initial structure de¬ 
fined for the dETECT model, clearly grouping the items according to their defined qual¬ 
ity factor (quality of IU, computing experience and perception of learning). 

Threats to validity 

Due to the characteristics of this type of research, this work is subject to various 
threats to validity. We, therefore, identified potential threats and applied mitigation strat¬ 
egies in order to minimize their impact on our research. Some threats are related to the 
design of the study. In order to mitigate this threat, we defined and documented a sys¬ 
tematic methodology for our study. The dETECT model was defined based on the GQM 
approach, systematically decomposing the evaluation objective into analysis questions 
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and measures. The measuring instrument was developed following a scale and question¬ 
naire development methods defined in literature and involving a multidisciplinary team 
of researchers. In addition, for the evaluation of the dETECT model measuring instru¬ 
ment, a case study was systematically defined and documented. Another risk refers to 
the quality of the data pooled into a single sample, in terms of standardization of data 
(response format) and adequacy to dETECT model. As our study is limited exclusively 
to evaluations that used the dETECT model this risk is minimized as in all studies the 
same data collection instrument has been used. Another issue refers to the pooled data 
from different contexts. To mitigate this threat we selected studies which considered 
only case studies of IUs for teaching computing in similar contexts. 

In terms of external validity, a threat to the possibility to generalize the results is 
related to the sample size and diversity of the data used for the evaluation. In respect 
to sample size, our evaluation used data collected from 16 case studies evaluating three 
different instructional units, involving a population of 477 students. In terms of statisti¬ 
cal significance, this is a satisfactory sample size allowing the generation of significant 
results (Clark and Watson, 1995; MacCallum et al., 1999; Kasunic, 2005, Devellis, 
2016). 

In terms of reliability, a threat refers to what extent the data and the analysis are 
dependent on the specific researchers. In order to mitigate this threat, we systematically 
documented the development and evaluation of the dETECT model, defining clearly the 
study objective, the process of data collection, and the statistics methods used for data 
analysis. Another issue refers to the correct choice of statistical tests for data analysis. To 
minimize this threat, we performed a statistical evaluation based on the approach for the 
construction of measurement scales as proposed by DeVellis (2016), which is aligned 
with procedures for the evaluation of internal consistency and construct validity of a 
measurement instrument (Trochim and Donnelly, 2008). 


6. Conclusion 

Although the evaluation of instructional units for teaching computing is essential for 
their continuous improvement and effective and efficient application, few efforts are 
made for the development of evaluation models. In this context, this article presents 
a first step into this direction taking also into consideration practical limitations when 
running such evaluations in more informal outreach programs. Based on literature and 
practical experiences, the evaluation model dETECT and its 13-item measurement in¬ 
strument have been developed systematically and applied at the end of 16 instructional 
units in middle school in Brazil. 

Results from the analysis of the responses of 477 students indicate that the mea¬ 
surement instrument is acceptable in terms of reliability and construct validity. With 
respect to reliability, a Cronbach’s alpha a=.787 indicates an acceptable internal con¬ 
sistency, which means that the responses between the items are consistent and precise. 
Our analysis also indicates convergent validity through an acceptable degree of cor¬ 
relation found between almost all items regarding the quality factors. Thus, it suggests 
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that the measurement instrument of the dETECT model can be a reliable and valid 
instrument for measuring the students’ perception of instructional units for teaching 
computing. The results of the factorial analysis indicate that three underlying fac¬ 
tors influence the responses of the items of the dETECT model measuring instrument 
confirming the original structure of the model, which defines three quality factors 
(quality of IU, computing experience and perception of learning) for the evaluation of 
instructional units. 
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